skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Shi, K"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Language models (LMs) are pretrained to imitate internet text, including content that would violate human preferences if generated by an LM: falsehoods, offensive comments, personally identifiable information, low-quality or buggy code, and more. Here, we explore alternative objectives for pretraining LMs in a way that also guides them to generate text aligned with human preferences. We benchmark five objectives for pretraining with human feedback across three tasks and study how they affect the trade-off between alignment and capabilities of pretrained LMs. We find a Pareto optimal and simple approach among those we explored: conditional training, or learning distribution over tokens conditional on their human preference scores given by a reward model. Conditional training reduces the rate of undesirable content by up to an order of magnitude, both when generating without a prompt and with an adversarially chosen prompt. Moreover, conditional training maintains the downstream task performance of standard LM pretraining, both before and after task-specific finetuning. Pretraining with human feedback results in much better preference satisfaction than standard LM pretraining followed by finetuning with feedback, i.e., learning and then unlearning undesirable behavior. Our results suggest that we should move beyond imitation learning when pretraining LMs and incorporate human preferences from the start of training. 
    more » « less
  2. The application of the Young–Laplace equation to a solid–liquid interface is considered. Computer simulations show that the pressure inside a solid cluster of hard spheres is smaller than the external pressure of the liquid (both for small and large clusters). This would suggest a negative value for the interfacial free energy. We show that in a Gibbsian description of the thermodynamics of a curved solid–liquid interface in equilibrium, the choice of the thermodynamic (rather than mechanical) pressure is required, as suggested by Tolman for the liquid–gas scenario. With this definition, the interfacial free energy is positive, and the values obtained are in excellent agreement with previous results from nucleation studies. Although, for a curved fluid–fluid interface, there is no distinction between mechanical and thermal pressures (for a sufficiently large inner phase), in the solid–liquid interface, they do not coincide, as hypothesized by Gibbs. 
    more » « less
  3. Free, publicly-accessible full text available September 1, 2026
  4. Integrating renewable energy into the manufacturing facility is the ultimate key to realising carbon-neutral operations. Although many firms have taken various initiatives to reduce the carbon footprint of their facilities, there are few quantitative studies focused on cost analysis and supply reliability of integrating intermittent wind and solar power. This paper aims to fill this gap by addressing the following question: shall we adopt power purchase agreement (PPA) or onsite renewable generation to realise the eco-economic benefits? We tackle this complex decision-making problem by considering two regulatory options: government carbon incentives and utility pricing policy. A stochastic programming model is formulated to search for the optimal mix of onsite and offsite renewable power supply. The model is tested extensively in different regions under various climatic conditions. Three findings are obtained. First, in a long term onsite generation and PPA can avoid the price volatility in the spot or wholesale electricity market. Second, at locations where the wind speed is below 6 m/s, PPA at $70/MWh is preferred over onsite wind generation. Third, compared to PPA and wind generation, solar generation is not economically competitive unless the capacity cost is down below USD1.5 M per MW. 
    more » « less
  5. Some arsenite [As(III)]-oxidizing bacteria exhibit positive chemotaxis towards As(III), however, the related As(III) chemoreceptor and regulatory mechanism remain unknown. The As(III)-oxidizing bacterium Agrobacterium tumefaciens GW4 displays positive chemotaxis towards 0.5–2 mM As(III). Genomic analyses revealed a putative chemoreceptor-encoding gene, mcp, located in the arsenic gene island and having a predicted promoter binding site for the As(III) oxidation regulator AioR. Expression of mcp and other chemotaxis related genes (cheA, cheY2 and fliG) was inducible by As(III), but not in the aioR mutant. Using capillary assays and intrinsic tryptophan fluorescence spectra analysis, Mcp was confirmed to be responsible for chemotaxis towards As(III) and to bind As(III) (but not As(V) nor phosphate) as part of the sensing mechanism. A bacterial one-hybrid system technique and electrophoretic mobility shift assays showed that AioR interacts with the mcp regulatory region in vivo and in vitro, and the precise AioR binding site was confirmed using DNase I foot-printing. Taken together, these results indicate that this Mcp is responsible for the chemotactic response towards As(III) and is regulated by AioR. Additionally, disrupting the mcp gene affected bacterial As(III) oxidation and growth, inferring that Mcp may exert some sort of functional connection between As(III) oxidation and As(III) chemotaxis. 
    more » « less
  6. A search is presented for fractionally charged particles with charges below 1 e , using their small energy loss in the tracking detector as a key variable to observe a signal. The analyzed dataset corresponds to an integrated luminosity of 138 fb 1 of proton-proton collisions collected at s = 13 TeV in 2016–2018 at the CERN LHC. This is the first search at the LHC for new particles with a charge between e / 3 and 0.9 e , including an extension of previous results at a charge of 2 e / 3 . Masses up to 640 GeV and charges as low as e / 3 are excluded at 95% confidence level. These are the most stringent limits to date for the considered Drell-Yan-like production mode. 
    more » « less
    Free, publicly-accessible full text available April 1, 2026
  7. A<sc>bstract</sc> Differential cross sections for top quark pair ($$ \textrm{t}\overline{\textrm{t}} $$ t t ¯ ) production are measured in proton-proton collisions at a center-of-mass energy of 13 TeV using a sample of events containing two oppositely charged leptons. The data were recorded with the CMS detector at the CERN Large Hadron Collider and correspond to an integrated luminosity of 138 fb−1. The differential cross sections are measured as functions of kinematic observables of the$$ \textrm{t}\overline{\textrm{t}} $$ t t ¯ system, the top quark and antiquark and their decay products, as well as of the number of additional jets in the event. The results are presented as functions of up to three variables and are corrected to the parton and particle levels. When compared to standard model predictions based on quantum chromodynamics at different levels of accuracy, it is found that the calculations do not always describe the observed data. The deviations are found to be largest for the multi-differential cross sections. 
    more » « less
    Free, publicly-accessible full text available February 1, 2026